Linguistics and Literature Review (LLR) 


Liv dl 


Lipomo Brega (LA 


fie Fh al 





Volume , Issue 2, October 2015 


Journal DOI: 
Issue DOT: 





a YS ISSN: 2221-6510 (Print) 2409-109X (Online) Journal homepage: http://journals.umt.edu.pk/IJr/Home.aspx 


Urdu Writing Rules for Online Input in PDA’s 


Fareeha Anwar 
S. Afaq Husain 


To cite to this article: Fareeha Anwar & S. Afaq Husain (2015). Urdu Writing Rules for Online 
Input in PDA’ s, Linguistics and Literature Review 1(2): 61- 77. 


To link to this article: 
Published online: October 31, 2015 


Article QR Code: 


GEMEN 


p 
Sa M 


\TY OF 





» 4 
Nyaa O~ 


O 


TV o` 
Nie O 


ae 


A publication of the 
Department of English Language and Literature 
School of Social Sciences and Humanities 
University of Management and Technology 
Lahore, Pakistan 


Copyright © Linguistics & Literature Review (LLR), 2015 
DOI: 


Urdu Writing Rules for Online Input in PDA’s 


Fareeha Anwar 
Department of Computer Science International Islamic University - Islamabad, Pakistan 


S. Afaq Husain 


Department of Computing Ripah International University - Islamabad, Pakistan 


ABSTRACT 


For online input, stroke sequence based recognition is generally Keywords: online Urdu 


employed. For this method, the stroke sequence must be uniquely ie Sue i ae hs 
defined for every character/Ligature. Normally, every language eau nl OnE ai SUORE 


has unique writing rules, which are followed by experienced users 
and recognition engines. However, there are variations in writing 
style from person to person and place to place. Languages which 
are written from right to left e.g. Urdu, Arabic, and Persian etc. 
are complex and have a lot of variations due to fonts and writing 
style. If rules are not followed properly, the recognition engine is 
bound to fail. Therefore, proper writing rules are necessary for 
online recognition of any language based on stroke sequence. 
There are no published and acknowledged rules available so far 
for Urdu language. This paper is an effort in accumulating writing 
rules for _ Nastalique‘ font for online Urdu recognition engine 


Introduction 


Writing on tablet, PDA or any online input device, generates a sequence of strokes. These stokes 
are sent to recognition engine which then accepts or rejects the strokes based on predefined rules. 
Different fonts are available for single language just like roman bases languages. Each font has its 
own writing rules which vary from each other and also has influence on each other which causes 
confusion in reception. People writing Nastalique can use Nasakh rules and vice versa is also 
possible. If stroke sequence is not followed according to predefined writing rules, the rejection 
rate increases. If user changes the direction of writing stroke; e.g. if a diagonal line starting from 
top to bottom is written from bottom to top, its shape seems accurate but stroke sequence totally 
changed so online recognition engine will not be able to recognize it. Therefore to improve the 
performance of recognition engine, user should follow writing rules. We faced similar problems 
when developing online recognition engine for Urdu handwritten characters and ligatures (Husain 
et al., 2007). Most common mistakes were pen up in middle of ligature/ character, writing ligature 
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starting from opposite sequence; when pen up is required continue writing generating duplicate 
sequence etc. Therefore need for devising predefined writing rules was felt. Unless these rules are 
known and the users are trained according to these rules, the recognition engine is bound to fail or 
give very high failure rate. While searching for writing rules for Urdu language we came across a 
lack of well published or publicized writing rules. We have devised rules for Urdu language, which 
will increase recognition rate. These rules will also be guide for new users to learn writing Urdu 
language or to use computing device for automatic learning just like writing tutorial. There are 38 
characters in the Basic Urdu Character set given below 


See ee & | 
TIF I339 Gage 
ELF ddog 
SSGetd 
posxdfd 
de U 


Figure 1. Basic Urdu Alphabets 


According to the revised extended character set in Urdu, there are a total 58 Urdu alphabets 
(Zaheer et al., 2007). The new alphabet set of Urdu is shown in figure 2. Urdu is a cursive 
languages and very difficult to recognize as discussed in (Starr, 1985). 


E E E E E E E TF ! 
oe jw 2 jó ims bL UgG 
Jsb LtrEIiJJduvJvêėł? 3 
» svu dudse SSS EF 

GU rr ®t a 


Figure 2. Character Set (58 alphabets) of Urdu Script. (Zaheer et al., 2007) 


Four different shapes depending on whether the character is isolated, in the beginning, at the end 
or connected from both the sides in a word as shown in Table 1 (Reza et al., 2005). Therefore, 
each character has different shape according to position in a given ligature. Most of the Urdu 
characters have same shape in ligature provided the same context In the Nastalique way of writing 
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script, Urdu assumes, so for easy and efficient recognition 38 characters are divided into 18 classes, 
shown in Table 1. 


Table 1. Classification of Urdu 


S.N Urdu Letters S.N Urdu Letters 
1 j Alif 10 1h 3 Fay 
2 Se coe Kashti 11 afa Kay 
3 ELEG Jeem 12 g Laam 
4 4 $ 3 Daal 13 ¢ Meem 
5 eE a. Ray 14 2 Wao 
6 P o” Seen 15 8 Gool hay 
7 Of Swaad 16 Ø Do chashmi hay 
8 BSL Tuain 17°. Choti Yaa 
9 È, £, Aein 18 é Bari Yaa 
Diacritics 


Diacritics are very important in Urdu language. These include diacritics such as Dots, Ttaay, 
Hamzaa, Diagonal and Madaa, etc. 


Pia "ae . a p a “ 


Figure 3. Diacritics/Aerab of Urdu (Aamir et al., 2001) 


Basic rules 


° Nastalique is actually written from top right to bottom left. 
la 4 
(1). . : 
Ka Ba (Basic phonemes) 


# 
° Each ligature that starts with Jors and ends with isọ lted at approx. 45 degree. 
This is of particular significance as there is no fixed level or height for any character with 
respect to base line. 


(2) 
Chachi : 
Aunt CJ 
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° Some parts of a word are written at an angle of about 30 degrees to the baseline as shown 
in figure below (Reza et al., 2005). 


Sabr Salamat 


Patience Mercy 
° Write words separately if it is possible as shown in figure below. 
(4) © ria 
Aap Nay JisKo 
You who ever 
° Writers give distance between two words. 
ell 
jist * | | 6 
(5) 
Sab Hum 
We all 
° Long spaces should not be given between ligatures within a single word 
| | 9 U 
(6) 
Pakistan 


° All cusps—shosha should be drawn properly. Length of cusp should be proper so that it 
differentiates the alphabets with in ligature. 


su SJ 
(7) 7” =U 
Shair Liyay 
Loin For 
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° Secondary strokes of first ligature are drawn before second one if word is composed of 


more than one ligature. 6 
5 1 T ) j 
2, 3, 
(8) Pani 4 
Water 
° Secondary strokes should follow proper sequence. (9) 
el 2 
pd . J yk X 
J pn j i> 
2 w 1 
Shair 
Loin 
° Stroke written using full qat (length of nib never joins another full qat, rather it always 


joins with a half qatglyph 


fa) ih) 


Figure 3 (a). Full qat kashish joined with half qat connector, (b) Full gat kashish joined with full 
qat circle 


Shapes of classes 


In each class, there is one or more than one character. Each differs in number and shape of 
secondary strokes. 


Alif class 

° Shape of Alif is long vertical line (8 to 10 pixel); direction is from top to bottom. | 

° Shape alif remains same at the isolated, start, middle, or end position of any ligature. 

° It remains isolated when comes at start of any ligature. 

° Stroke direction changes when it comes after other character 1.e., bottom to top instead 


from top to bonom EH 


° When Laam is followed by alif, U y it will be slanting line from top to bottom joining 
the base of Laam rather than the usual vertical line. 


Kashti class 
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Five characters are present in this class named —Bay “= , —Pay = , —Taay gu, Ttay & and 
Saay sy. 


Basic shape 


Short vertical line (top to bottom) followed by a = long horizontal line (right to left), 
then short vertical line (bottom to top). 


Shape at start position 


e If jeem class follows © kashti then its shape is simple middle (5 to 6 pixel) diagonal line 
at 210+20 degrees (top right to bottom left). 


3 
° If kashti is followed by characters having loop in them e.g. —fay, Aien, —qaaf, —wao 
etc., ray and —yay class, 
° Then its shape is simple —ray or half —kashti i.e., short vertical line (top to bottom, 


90+20) followed by short horizontal line (right to left, 180+20). There is no cusp after it. 


Cute le 


+ 


e If meem follows Kashti then its shape will be short curve or diagonal line (top to 


f 


° If —Alif and Kaaf or —Laam follows Kashti then its shape is semicircle (right + 
downward + left + long upward). 


bottom) at 230+20 degrees 


° For rest it is semicircle (from right + downward + left + upward). There is cusp 
at the end. 


Shape at middle position 
If Kashti comes in middle, then it is semicircle (right + downward + left + upward). There is 
cusp at start as well at end. 


s 


. 


All rules for the shape at stating position are same for shape at middle position, including cusp at 
Start. 


m, 


. ” 
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Shape at end position 
At the end position, shape is same as isolated but cusp at start. 


Ea 
. 


Jeem class 
Four characters are present in this class named Jeem © , Chay G, Hay © and Khay t. 


Basic shape: horizontal line (4 to 5 pixel long) from left to right, followed by downward curve 
from left to right (angle: downward 210+20 and upward 30+20). 


Alternatively, short vertical line from bottom to top (2 to 3 pixel long) at start and rest shape is 
same as above. 


C 
Shape at start position 


Half Jeem (before curve) either of two given ways followed by other character ===. 


Shape at middle position 
e Middle diagonal line of 225+20 degrees, followed by sharp edge and diagonal line at 
330+20 degrees followed by sharp edge and diagonal line/curve at 210+20 degrees 


xe 


° If Jeem class follows Kashti, then its shape is half Jeem starts from horizontal 
line, not from vertical line 


Shape at end position 
o When it comes at end, its shape is same as isolated, starts from horizontal line, 


not from vertical line. 


Daal class 
Three characters are present in this class named —Dal , —Ddal and —Zal. 


Basic shape 


Medium diagonal at 315+20 degrees from left to right, followed by either downward curve or 
medium diagonal line at 210+20 degrees from right to left. 


Shape at start position 
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° Remain same as of isolated. 


Shape at middle position 
° Never comes in middle position of ligature 
Shape at end position 


° At the end of any ligature it starts with cusp followed by medium vertical line (top to 
bottom) and then medium horizontal line 
ate 


Ray class 
Four characters are present in this class named, Ray ~, —Aray/ , —Zay / and ZY aay’ . 
Basic shape 


Starting from right to left, its shape is medium vertical line of 250+20 degrees, followed by 
horizontal line of 180+20 degrees 


Shape at start position 
° Remain same as of isolated. 


Shape at middle position 


° Never comes in middle position of ligature 
Shape at end position 
o At the end of any ligature, its shape is medium vertical line at 225+20 (top to bottom) and 


then medium horizontal line 


PA J” 
Seen class 
Two characters are present in this class named —Seen J and—Sheen “ 


Basic shape 


Two small semi circles (right + down+ left + up) having diameter 2 to 3 pixels, with two cusps 
followed by a big semicircle having diameter 7 to 8 pixels. 


>j 


Shape at start position 
° First part of shape 1.e. two semi circles remains same with two cusps There is no 
last big semicircle. “~~ 
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° If —Jeem follows —Seen, —Meem or —yaa class then after second semicircle there is 
no cusp but start the shape of next character. 


Iré 


Shape at middle position 


° Start shape remain same but instead of two, three cusps are present including starting 
cusp as well. 


° All conditions of start position remain same. J 


Shape at end position 
e Remain same as of isolated. Instead of two, it has three cusps (start cusp as well). "a 


Swad class 


Two characters are present in this class named —Swad VM and —Zwad 
Basic shape 


Start from left its shape is diagonal line (2 to 3 pixels) making an angle of 30+10 followed by 
small downward curve and move back towards starting location making a loop. After starting 
loop there is small vertical line followed by a cusp and a semicircle (down-+ left + up) having 
diameter 7 to 8 pixels same as last part of —Seen 


Shape at start position 


° First part of shape 1.e. loop with cusp, remains same. Instead of last semicircle, the shape 
is small 


—Ray with cusp. ke Shape 
at middle position 


° Whenever it comes in the middle of ligature, there is pen up and shape remains same as 
of start shape. 
° All conditions of start position remain same. 
Lao 
(Pen up) 
Shape at end position 
° Whenever it comes in the end of ligature, there is pen up and shape remains same as of 
isolated. 
ff 
(Pen up) 4 
Tua class 
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Two characters are present in this class named —Tua band —Zua£ 


Basic shape 


There is straight vertical line from top to bottom making an angle of 270+20; same as Alif; 
followed by a cusp and stroke same as —Dal with curve (not with diagonal line) then move back 


towards starting location making a loop. After loop, there is small horizontal line b 


Shape at start position 


° Shape remains same as of isolated. > 


Shape at middle position 


° Whenever it comes in the middle of ligature, there is pen up and shape remains same as 
of isolated shape ae 
A 
° All conditions of isolated position remain same. 
(Pen up) 
Shape at end position 
° Whenever it comes in the end of ligature, there is pen up and shape remains same as of 
isolated. E 
(Pen up) 
Aien class 


Two characters are present in this class named —Aienl © and —Ghaienl © 
Basic shape 


Starting from top right making a diagonal line/curve in downward left direction making angle of 
225420 followed by small curve then move back towards right (shape same as —Dall but in 
opposite direction) and same as small semicircle having diameter 3 to 4 pixels. Then there is a 
cusp and big semicircle (down-left + right + up) having diameter 7 to 8 pixels 


Shape at start position 


° First part of shape 1.e. small semicircle with cusp remains same. Instead of second curve, 
there is small horizontal line/curve toward left. 
lof 
(Pen up) 
° If —Meem or —yaa class follows —Aienl then after first part of shape next character 


Starts at once. 


E f & 
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é 


° If —Aien is followed by —Jeem class then after first part there is small diagonal line at 
angle 225+20 


Shape at middle position 


° Whenever it comes in the middle of ligature, there is a small upward diagonal line/curve 
making an angle 135+20 followed by small horizontal line from left to right and finally 
small downward diagonal line/curve making an angle 335+20. This will make a loop. 


p" 


Shape at end position 


° At end of ligature, first part remains same as —shape at middle and after loop there is a 
semicircle(down left + right + up) having diameter 7 to 8 pixels (same as second part of 
Aien shape without cusp) 


Fay class 


Two characters are present in this class named —Faay Wand —Qaaf Y 
Basic shape 


“Faay” Start from down and move towards left, up, right then down make a circle of radius 2 to 
3 pixel (loop). After loop, the shape is same as —Kashti. = 


“Qaaf” loop remains same and after loop there is a big semicircle having diameter 7 to 8 pixels 


(right +down +left+ up). J 


Shape at start position 


° If both characters occur at start then the shape is only first part 1.e. loop. 

° If —Fay is followed by, —Meem f or —yaal class then after loop next character starts 
at once. , 

b 

° If —Fay is followed by © then after loop there is small diagonal line at angle LA 
225+20 

Shape at middle position Bb 

° Shape at middle is same as shape at start position only connected with previous character. 


All conditions remain same as of start position. 


Shape at end position 
oe 


° At the end of ligature the shape is same as of basic shape o> 
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Kaaf class 


: s 7 7 
Two characters are present in this class named —Kaf = and —Gaafl — 
Basic shape 


At start, there is medium vertical line; instead of short vertical line; rest part is same as basic 
shape of —Kashti class. 


Shape at start position 


° If —Jeem, —Ray, —yay class and all the classes having loop follows —Kaaf then its 
shape is long vertical line (top to bottom, 270+20) followed by short horizontal line (right 


to left, 180+20). There is no cusp after it a. 


e If —Alif or —Laam follows —Kaf class then its shape is same as start position of —Fay 
class, 1.e. loop. All conditions of —Fay at start position remain same. 


° For rest it is semicircle (from right + downward + left + upward) with straight vertical 
line. There is cusp at the end. 1 


J 


Shape at middle position 


° If Kaaf comes in middle, then there is a vertical line from top to bottom making an angle 
of 90+20 then downward stroke making an angle of 270+20 then small semicircle (right 


+ downward + left + upward). There is cusp at start as well at end. & 


° If —Alif or —Laam follows —Kaf class then its shape is same as middle position of 
—Fay class, 1.e. loop. All rules for the shape of Fay at middle position remain same 


6. 


Shape at end position 
° At the end position, shape is same as isolated but cusp at start. 


L 


Laam class 


Two characters are present in this class named —Lam vand —Nun 4Y 


Basic shape 


“Nun”: it is same as last part of —Seen 1.e. semi circles (right + down+ left + up) having 
diameter 7 to 8 pixels. uj 


“Lam”: At start, there is vertical line from top to bottom making an angle of 270+20, remaining 
part 1s same as —Nun. J 


Shape at start position 
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Nun”: At start of any ligature, its shape is same as shape of —Kashti at start position. ~ ° = 


° All conditions associated with shape of —Kashti at start position remain same. 


“Lam” at start of any ligature, its shape is same as shape of —Kaaf at start position. ” 


° All conditions associated with shape of —Kaaf at start position remain same. Except to 
When —Alif follows —Lam, its shape is vertical line followed by short horizontal line 
(right to left). U 


Shape at middle position 


“Nun”: At middle of any ligature, its shape is same as shape of —Kashti at middle ae Cs 
position. All conditions associated with shape of —Kashti at middle position remain yy 

same. at 
“Lam”: At middle of any ligature, its shape is same as shape of —Kaaf at middle position; all 


conditions associated with shape of —Kaaf at middle position remain same. Except when —Alif 


follows —Lam, its shape is vertical line followed by short horizontal line (right to left). Ly 


Shape at end position 

° At the end position, shape is same as isolated but cusp at start. 
, Y r 
U g 

Meem class 


ai 
One character is present named —Meem | 
Basic shape 


It starts from left. Its shape is diagonal line/curve (2 to 3 pixels) making an angle of 30+10 
followed by small downward curve and move back towards starting location making a loop. 
After starting loop there is small horizontal line followed by a large vertical line from top to 
bottom same as —Alif. 


Shape at start position 
Start from up and move towards left, down, right then up make a circle of radius 2 to 3 pixel 
(loop). 
° If —Alif, —Jeem, —Mem or —Y aa class is follows —Mem class then the shape of next 
LZ, | 
character start at once. ai . 


° For rest after loop, there is diagonal line making an angle of 225+20 og 
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Shape at middle position 

° Shape same as starting position joined with previous character?*. All conditions of shape 
at start location remain same. 

Shape at end position 

° Loop remains same as of middle position. After loop, shape is same as basic shape after 


loop. P 


Wao class 
One character is present named —Wao ?: 
Basic Shape 


First, there is loop same as —Fay and after loop there is a curve same as —Daall facing towards 
left. 


Shape at start position 
° Remain same as of isolated. 


Shape at middle position 

° Never comes in middle position of ligature 
Shape at end position 

° Remain same as of isolated 


Gol Hay class 

One character is present in this class named —Gol Hay 
Basic Shape 

° Its shape is same as circle, 

starting from right moving left and Y 


downward then right and upward back to starting position making a loop of radius 4 to 5 pixel. 


° Alternatively, a curve of angle 235+20 then moves right 2 to 3 pixel and at the end curve 
of angle 60+20. There is an intersection point making a loop. 


Shape at start position 


At start, its shape is same as shape of —Kashti at start position Ta 
Shape at middle position 


A line of angle 235+20 followed by a curve of angle 60+20. There is cusp point between these 
two curves. 
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Shape at end position 

At the end of ligature, its shape is a small line of angle 235+20 followed by a small curve and a 
medium line of angle 150+20. When height of this line approaches starting point then again a 
curve followed by small line of angle 


235420 ~ 


Do Chashmi Hay class 


One character is present in this class named —Do chashmi Hay # 


Basic shape 


There is a small line of angle 235+20 followed by a loop same as loop of —Swad. After first 
loop, there is another loop same as —Swad intersecting previous one and moving back to 
starting position. At the end, there is small horizontal line. 

Shape at start position 


Same as isolated shape w. 
Shape at middle position TE 
Same as isolated shape 

F 


Shape at end position 
Same as isolated shape 


ChotiYaa class 


i . . . i¢ 
One character is present in this class named —ChotiYaa ° 


Basic Shape 
A small line/curve of angle 245+20 followed by a curve of angle 315+20. Then there is 
semicircle same as —Fay class, 


Shape at start position 


e When it comes in starting position its shape remain same as shape of —Kashti at starting 
l 


position. 
e All conditions of shape of —Kashti at starting position remain same here 


Shape at middle position 
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e When it comes in starting position its shape remain same as shape of —Kashtil at middle 


position. K 


e All conditions of shape of —Kashti at middle position remain same here 


Shape at end position 


e If ChotiYaa follows Kashti, —Jeem, —Fay, —Kaaf or —Laam class then its shape is 


only last part i.e. semicircle o 


l l ; ad 
e For rest its shape remains same as of isolated. W 


Bari Yaa class 


One character is present in this class named —Bari Yaa 


Basic Shape 
A small line of angle 245+20 followed by big horizontal line from left to right making an angle 
of zero. 


e Or at start there is small vertical line from top to bottom and rest part is same as previous 


d 
— 


Shape at start position 
Same as shape of —Choti Yaa at starting position. 


! 


++ 


Shape at middle position 
Same as shape of —Choti Yaa at middle position. 


"i 


Shape at end position 
Its shape remains same as isolated 


i. 


Results and conclusion 


We have tested the online Urdu OCR developed earlier by (Husain et al., 2007) by using 10 native 
Urdu writers and found out that there were a number of ambiguities in variations which result in 
failure in recognition by the engine. Moreover, the standard writing rules described above have 
not been devised keeping in view the online input and as such are not efficient in writing using a 
stylus or digital pen. It is therefore recommended that writing rules be modified keeping in view 
stroked based convenience and efficiency for online input devices. 
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Urdu language is difficult to learn and write. If proper writing rules are not followed in online 


input then recognition rate decreases as shape from ~ to “seems same but stroke sequence is 
completely changed. We also analyzed that there are many pen ups required in writing different 


Pal 
ligatures which make recognition process slow, e.g. if we want to write & , first we write © 
then a pen up and write the stroke. However, if we write this stroke without pen up then this will 
be more convenient and efficient i.e. a~ 


Future direction of research 


Future work includes devising efficient and convenient rules for online input. This will make 
recognition engine more efficient. Also, secondary strokes 1.e., diacritics are not discussed in this 
paper, so a future direction of research would be to devise rules including diacritical marks. 
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